CLICK: Clustering Categorical Data using K-partite Maximal Cliques

نویسندگان

  • Markus Peters
  • Mohammed J. Zaki
چکیده

Clustering is one of the central data mining problems and numerous approaches have been proposed in this field. However, few of these methods focus on categorical data. The categorical techniques that do exist have significant shortcomings in terms of performance, the clusters they detect, and their ability to locate clusters in subspaces. This work introduces a novel algorithm called Click, which finds clusters in categorical datasets based on a search method for k-partite maximal cliques. Click is able to detect subspace clusters, and outperforms previous approaches by a factor of two to three. It scales better than any of the existing method for high dimensional datasets. These results are demonstrated in a comprehensive performance study on synthetic and real data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks.

Many studies are aimed at identifying dense clusters/subgraphs from protein-protein interaction (PPI) networks for protein function prediction. However, the prediction performance based on the dense clusters is actually worse than a simple guilt-by-association method using neighbor counting ideas. This indicates that the local topological structures and properties of PPI networks are still open...

متن کامل

Clustering Numerical and Categorical Data

Clustering is an important technique for data mining which allows us to discover unknown relationships in our data sets. Clustering algorithms that use metrics based on the natural ordering of numbers cannot be applied to categorical (non-numerical) data. In this tutorial we will review the main methods for numerical data clustering (K-Means, Hierarchical Clustering and Fuzzy CMeans) and then s...

متن کامل

Finding All Maximal Cliques in Dynamic Graphs

Clustering applications dealing with perception based or biased data lead to models with non-disjunct clusters. There, objects to be clustered are allowed to belong to several clusters at the same time which results in a fuzzy clustering. It can be shown that this is equivalent to searching all maximal cliques in dynamic graphs like Gt = (V,Et), where Et−1 ⊂ Et, t = 1, . . . , T ;E0 = φ. In thi...

متن کامل

The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced ...

متن کامل

On finding k-cliques in k-partite graphs

In this paper, a branch-and-bound algorithm for finding all cliques of size k in a kpartite graph is proposed that improves upon the method of Grunert et al (2002). The new algorithm uses bit-vectors, or bitsets, as the main data structure in bit-parallel operations. Bitsets enable a new form of data representation that improves branching and backtracking of the branch-and-bound procedure. Nume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004